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EXPONENTIAL  BOUNDS  OF  MEAN  ERROR 
FOR  THE  NEAREST  NEIGHBOR  ESTIMATES 
OF  REGRESSION  FUNCTIONS 

L.  C.  Zhao 


ABSTRACT 


Let  (X,Y),  (Xj.Yj) . (Xn,Yn)  be  1*1*d*  Rf  x  R~  valued  random  vectors 

with  E|Y|<»,  and  let  mn(x)  be  a  nearest  neighbor  estimate  of  the  regression  func- 

V  <*»*■•...  -i 

tion  m(x)  =  E(Y|X=x).  In  this  paper^-we  establish  an  exponential  bound  of  the  .  ..  r 

^1/  it  f  <  j r  / 

mean  deviation  between  mn(x)  and  m(x)  given  the  training  sample  z”  =  (X^,Y^,. . . ,X^,Y^) , 
under  the  conditions  as  weak  as  possible.  This  is  a  substantial  improvement 
on  Beck's  result. 


Key  words.  Regression  function,  nearest  neighbor  estimate,  exponential 
bound,  mean  error,  training  sample. 


<  .<*  • 


O’--*'. 


»  *  <J 

^  .  *  ' 


Co*e® 


■  : 

V  •. 


\  s 


1.  INTRODUCTION 


Let  (X,Y) ,  (X1,Y1),...,(Xn,Yn)  be  i.i.d.  Rd  *  R-  valued  random  vectors 
with  E | Y [ <<*> .  To  estimate  m(x)  =  E(Y|X=x),  the  regression  function  of  Y  with 
respect  to  X,  Stone  (1977)  and  others  proposed  the  so-called  weight  estimation 

n 


(1) 


i"nlx>  * 


where  W  .(x)  =  W  .(x,X. ,. . . ,X  )  is  a  Bore! -measurable  function  of  its  arguments, 
nj  nj  i  n  p 

Let  V  .,  j  *  l,...,n,  be  non-negative  real  number  such  that  \  V  .  =  1.  For 
nj  J  3  jsj  nj 

suitable-chosen  metric  ||a-b||  on  Rd  (such  as  Lg  or  L^),  rearrange  Xj,  j  =  1,... 


»n : 


(2) 


(ties  are  broken  by  comparing  indices),  and  set 

<3>  "„<*>  ■  JiVnjVr 

Then  we  obtain  the  nearest  neighbor  (NN)  estimates  of  m(x). 

Many  scholars  studied  convergence  problem  of  these  estimates  from  different 
points  of  view.  (For  the  universal  consistency,  one  can  refer  to,  for  example. 
Stone  (1977).  For  the  pointwise  moment-consistency,  see  Devroye  (1981).  For 
the  pointwise  a.s.  consistency,  see  Devroye  (1981),  Zhao  and  Bai  (1984)).  In 
this  paper,  we  study  another  convergency  of  these  estimates. 

Write  Xn  =  (Xj,...,Xn),  Yn  =  (Yx . Yfl)  and  Zn  =  (Xn,Yn).  Let  gp  =  9n(x,Zn) 

be  an  estimate  of  m(x).  In  some  problems,  we  are  interested  in  the  following 
mean  deviation  of  gn  given  the  training  sample  Zn: 

(4)  D(gn)  *  E{|gn(X,Zn)-m(x)|!Zn) 

=  j^d|gn(x,Zn)-m(x)|Q(dx), 

where  Q  denotes  the  distribution  of  X. 
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Take  k=kn  <_  n,  and  put 


For  this  class  of  estimates.  Beck  (1979)  established  the  following  theorem: 

Suppose  that  the  following  conditions  are  satisfied: 

(6)  (i)  Y  is  bounded. 

(ii)  m(x)  is  continuous  on  Rd. 

( i i 1 )  Q  has  a  continuous  density  f. 

(iv)  k-*»  and  k/n+0  as  n-H». 

Then,  for  any  given  e>0, 

P{D(mn)>e}  <  e-cn 

where  C>0  is  a  constant  Independent  of  n. 

This  theorem  deals  only  with  a  special  case  of  NN  estimates,  and  the  assump¬ 
tions  are  rather  restrictive.  Recently,  we  substantially  improved  this  result. 

We  established  the  following: 

Theorem  1.  Let  mn(x)  be  a  NN  estimate  of  m(x)  defined  by  (2)  and  (3). 
Suppose  that  the  following  conditions  are  satisfied: 

(7)  (i)  Y  is  bounded. 

(ii)  Q  has  a  density  f. 

(iii)  There  exists  a  sequence  of  integers  k  =  kn  such  that 
k-*»,  k/n->-0, 

Supntkra!!i<j<kV  <  - snd  jliV0' 

Then  for  any  given  e>0,  we  have 

P{D(mn)>e)  <  e'cn, 

where  C>0  is  a  constant  independent  of  n. 


Note  that  the  special  case  considered  by  Beck  is  included  in  this  theorem. 
Besides,  this  theorem  gives  a  substantial  improvement  of  Beck's  result,  by  get¬ 
ting  rid  of  the  continuity  requirement  of  m(x)  and  f(x),  the  density  of  Q. 


2.  SOME  LEMMAS. 

Theorem  1  is  valid  for  the  L2  norm  or  norm  on  Rd,  here  we  only  give  the 
proof  for  L^  norm.  For  simplicity,  we  make  the  following  convention:  e.ej.Eg, 
...,C,CQ,C1,...,ct,81,B2»5»  etc.,  are  all  constants  independent  of  n.  IA  or  1(A) 
denotes  the  indicator  of  a  set  A.  #(A)  denotes  the  cardinal  of  set  A.  Sv  = 

(ueRd:  ||u-x||<p}.  Q*  and  X*  denote  the  outer  measure  generated  by  Q  and  the 
Lebesque  measure  X  (on  Rd),  respectively.  We  need  the  following  lemmas  in  the 
sequel . 

Lemma  1  (Besicovitch  Covering  Lemma).  Let  E  be  bounded  subset  of  Rd,  and 
let  K  be  a  family  of  cubes  covering  E  which  contains  a  cube  Dx  with  center  x 
for  each  xcE.  Then  there  exist  points  (x^Jin  E  such  that 


(i)  EclID  . 
xk 

(ii)  there  exists  a  constant  o  depending  only  on  d  such  that  £ kl(Dx  )  <_  a. 
Refer  to  Wheeden  and  Zygmund  (1977),  pp.  185-187. 

Let  Qn  be  the  empirical  measure  of  ...,7  ,  and  T>o  be  a 
given  constant.  Fix  5fe(o,l/2o)  and  assume  that  h  =  hnfe(o,l).  Set 


(8)  Gn  -  {«SD  T:  QnCSx,h)<«Q(Sx  „)}. 
and 

(9)  E*  =  {xeSo>T:  8!(2>)d<Q(Sx>h)  <&2(2p)d 

for  any  pe(o,l)}  , 

Where  81>o  and  82>o  are  constants  to  be  chosen  later. 


LEMMA  2.  suppose  that  Q  has  a  density  f.  Then  for  any  e  >  0,  we  can 
choose  Bj  small  enough  and  $2  large  enough  such  that  Q*(Sq  y-E*)  <  e. 

Note  that  for  any  Borel -measurable  set  EcE*,  we  have 
1  f(x)  ±  for  almost  all  x«E(x) . 

LEMMA  3.  Suppose  that  Q  has  a  density  f ,  h  *  hn  c(0,l)  and  nhd  -*•  ».  Then 
for  any  given  e  >  0,  we  have 

P{Q*(G*)  >  e>  <  e“Cln. 


Lemmas  2  and  3  can  be  deduced  from  Lemma  1.  For  the  proof,  see 
Zhao  (1985). 

Lemma  4.  Suppose  that  j  d|g(x)| pF(dx)  <  ®  for  some  p  >0,  then 

lim  fs  |g(u)-g(x)fpF(du)/F(S  h)  *  0 
fHoJ^x.h  x,n 

for  almost  all  x(F) . 

Refer  to  Wheeden  and  Zygmund  (1977),  p.  191,  example  20. 

3.  Proof  of  Theorem  1 

Suppose  that  |Y|  <_  M.  Then 


|iJ(!vnj(vj-m(x,)|(!(d’<)i2^kv»j*0 

as  n  +  ».  Without  loss  of  generality,  we  can  assume  l  *  0  for  any  n. 
It  is  enough  to  prove  that  for  each  fixed  T  >  0, 

(10)  P(L  (x)-m(x) tQ(dx)  >  e}  <  e"cn . 

j50,T/2  " 

By  Lemma  2,  there  exists  *  8^(e),  i *1 ,2 ,  and  a  compact  set  EcrE* 


such  that 
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(u)  Q(so,rE> <  e/8M» 

where  E*  is  defined  by  (9). 

Fix  Se(0,icr),  and  take  a  >_  (2de16)-1.  Set 

h  =  hn  =  (ak/n)1/rd  » 

then  h  -*•  0  and  nhd  -*>  »  as  n  -*  «. 

By  Lemma  3,  there  exists  a  compact  set  Hn  such  that  with  h  as  above 

(12)  H„C  {«S0J:  Qn(Sx>h)  >  «Q(Sx>h)l 
and 

(13)  P{Q^0,T-Hn)  ±  e/8M}  <  e"C-n* 

For  xeHnn  E,  Qn(Sn>h)  _>  6Q(Sx>h)  >_  B1«x(SXjh)  =  B152dak/n  >  k/n,  so 

that  xj,  xj,...,x£  all  fall  into  S  h. 

Partition  Rd  into  sets  with  the  form  n  [(i  .-l)h,i  .h) ,  where  i,, 

j=l  J  J  1 

...,id  =  0,  +  1,...  .  Call  the  partition  v.  Set  =  {Bt'i'>BeS0  T>.  For 
B*f 1 ,  put 

W(B)  =  {Ba«Y.p<B,B>)  <  3h},  W(B)  *  UB,fc(j(B)B', 

where  p(B,B')  *  inf{ | | x-x' | j :  x«B,  x'tB'}.  Then  there  exists  a  constant 

«• 

Cd  such  that  for  any  B«f'  we  have  #(W(B) )  <_  Cd<  It  is  easy  to  show  by 
induction  that,  v'  can  be  divided  into  C2(<^  Cd)  disjoint  subsets  ,  1=1, 
....Cg,  such  that  for  any  two  sets  Bj,  B2  in  the  same  f . ,  we  have 

w(b1)  n  w(b2)  =  0. 

Denote  by  B(x)  the  cube  Bef  which  contains  x.  If  xtHnfl  E  and  BU)^', 
then  for  any  ueB(x),  we  have  S^hcSu  2(lcM(B(x)),  so  that,  from  Qn(Sx  h) 

±  k/n  it  follows  that  Xj,...,xJJ  are  also  contained  in  W(B(x)).  If  we  write 


An  =  BnHnnE#0} 

then,  as  mentioned  above,  for  any  B«An,  W(B)  contains  the  k  nearest 
neighbors  of  each  x^B.  Further,  we  set  H.  =  Anf)  ,  i=l,2,...  ,C2.  It 
is  easy  to  see  that 

[c  I  ^n(x)_m(  x)  |  Q(  dx)  [c  "*■  c  u  [u  pi  rnc  • 
JS0,T/2  n  JS0,T-E  JS0,T  Hn  iHnn  En S0,T/2 


By  (11),  we  have 


[c  |m  (x)-m(x) iQ(dx)  <  2MQ( SQ  T-E)  <  e/4. 
^0,T-E  n  ”  u*‘ 

By  (13), 

p([c  H  |m  (x)-m(x)|Q(dx)  >  e/4) 

Jb0,T  Hn  n 

<  P{Q(S0J-Hn)  >  e/8M)  <  e  1  . 

Hence  to  prove  (10),  it  is  enough  to  prove  that 

C  f) 

(14)  P{jH  nEn$o  T/2|mn(x)-m(x)|Q(dx)  >  e/2)  <  e  3  . 


For  large  n. 


knEnS0J/2^nU)‘m(X)lQ<dX) 

-  I  jBnEImn(x)-m(x)|Q(dx) 

C  n 

2  r 

ill  BnEh  (x)-m(x)lQ(dx) 
i=l  B n 


s  ,LVnjm(Xj)» 

J  • 

ln1  “  B|fJBnE|inn(’<)'"ln(l<)lQ(dx,> 
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1 


: 

? 
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Jni  =  gl^  {b  n  Elmn^x^“m^x^ ! Q<dx) *  1=1 . C2 

♦(B)  =  JenE1  Vnj(Yj"m(Xj)) |Q(dx)/Q(BnE) * 
J  ^ 

dm.  =  #{ Be'W. ,  4(B)  >  e/(8C2)},  i=l . 


To  prove  (14),  it  is  enough  to  show  that,  for  each  i,  1  _<  i  _<  C2,  we  have 
(16)  PUni  >  £/(4(-2)}  <  e''4" 

d7)  P(Jni  »  e/(4c-,)  <  e^5"  . 

For  almost  all  xeBfiE(x),  f(x)  <_  s2<  Hence, 

Im.  £  e/(8C2)  +  2Mdnl- B2ak/n. 

Write  c6  =  cdeMCjaBj)"1,  then 

(18)  P<I„iie/(4C2)>^PfdniiC6"/k>- 

Now  we  proceed  to  prove  that,  for  any  Beff^ , 

n  -C7k 

(19)  P{*(B)  >  e/8C2|Xn>  <  e  , 

where  Xn  =  (X^,...,Xn)  is  defined  as  before. 

For  any  >  0  and  s  >  0,  by  Jensen's  inequality  we  have 

(20)  PU(B)  >  El|Xn}  <  e  SElE{exp(s4(B))|Xn} 

<  e  Sei|BnEE{exp(s|  [Yj-m(xJ)]|  [Xn}Q(dx)/Q(Bn E). 


W 


8 


When  {Xj,  j  <  k}  is  given,  are  independent.  From  this  and  the 

inequality  je^-l-tj  <,  £t2e^  for  any  real  t,  it  follows  that. 


E{exp(s  l  V.[Y*-m(X*)])|Xn} 

j-J  J  J 


=  n  E{exp(sV n,[Y*-m(X?)])|X*} 
j-J  nJ  J  J  J 

<  n  {l+s2C2k"2exp(2sCQk_1)} 

“j-1  8  8 

<  exp{s2c|k"1exp(2sCgk“1)}. 

Here  we  have  written  Cg  =  Supn(k  max^V^}  and  Cg  =  CgM.  In  the 


same  way, 


E{exp(s  l  V  1[m(X*)-Y*])|Xn> 
j  =1  nJ  J  J 

<_  exp{s2Cgk"1exp(2sCgk"1)}. 

In  view  of  (20),  we  get 

P{<J>(B)  >,  ej  | Xn)  .<  2  exp{-se1+s2Cgk“1exp(2sCgk“1)} 
Take  s  *  yk  with  y  being  small  enough,  we  have 


-C  k 

P{$(B)  >  El|Xn>  <  e  10  . 


This  is  just  (19). 


Since  for  each  BcH. ,  W(B)  contains  the  k  nearest  neighbors  of  each 
x*B,  and  WfB^DWfBg)  =  0  for  any  Bj,  B2e.H. ,  we  see  that  when  Xn  =  (Xj,...,Xn) 
is  given,  (4>(B),  BeH^.}  is  a  group  of  conditionally  independent  variables.  Put 
G(B)  =  { <t>( B)  .>  e^} .  Then  by  (19)  and  #(^ )  <_  #(?')  <_  C^n/k,  we  have 


P{dni  >  C6n/k|Xn) 

-  p{UH  C  H  ^gn/kfi  X  ^ 

1  * 

I  ^Hc=H.,#(H)>C6n/kP^° 

=  ^Hc:  H.,#(H)>C6n/knB£HP^G^B^X  ^ 

*  V  /#(Hi)\  -C7k  . 

-  ^C6n/k<j<#(W.)f  1  )(e  7  )J 

-C6C7n  #(H.)  "C6C7n„Clln/k  ~C12n 


From  (18)  and  (21)  it  follows  (16)  is  valid. 

Now  we  proceed  to  prove  (17).  As  mentioned  above,  for  each  Bfeff. , 
Xl,,**’Xk  a11  fal1  int0  W(B)*  Noti ci ng  the  conditions  imposed  on  Vnj. 's, 
we  see  that 

f  k 

(22)  Jni  =  W.  BOE1  J.V  .(m(X?)-m(x))|Q(dx) 

1  J-1  nj  J 

-  C9k  .LIW(B>(Xj)(BnElm(Xj)“m(x)lQ(dx) 

1  J  * 

=  Cgk  1IBi?  -|1IW(B)^Xj^ZB^Xj^  * 

'  J  A 


where 


(23)  ZB(u)  =  |BriEjm(u)-m(x)|Q(dx)  <_  2Me2ak/n. 

Here,  the  following  facts  are  used:  |m(x)|  <_  M,  f(x)  _<  e2  for  xeBOE  and, 
x(B)  <  hd  =  ak/n. 

Put  e2  =  e(8C2Cg)-1.  To  prove  (17),  it  suffices  to  prove  that 


P{W  XWXj>ZB(Vi2ke21<e 

•  J  * 


"C13n 


Let  N  be  a  Poisson  random  variable  with  parameter  n,  which  is 


independent  of  Xj.Xg,..,  .  If  |N-n|  <  ne3  =  ne2/(2Me2ot) ,  then  by  (23) 

“  jI1IW(B) CXj)zB(Xj) >  1 

<_  |  N-n|2MB2ak/n  <  e2k. 

It  follows  that 

(25)  J.|1IW(B) ^Xj —  2ke2} 

<  P(  |N-n|^ne3)  +  P^g^.  -L*W(B)^Xj^ZB^Xj^  >  ke2} 

It  is  easy  to  show  that 

-C,  «n 

(26)  P{|N-n|>ne3}  <  e  . 

Since  W(B),  B^f^ ,  are  disjoint,  we  see  that  for  t  >  0, 

P{Jb«1  ,|iIw(B)(XJ)ZB(XJ)  >  ke2} 

-e  2  Jo6  ^  ^E{eXp^t^B«H'iIW(B)^Xl^ZB^Xl^}^ 

(27)  -te,k  _  »  l  ,  tZB(u)  , 

=  e  e  J0  71  ^B^iJw(B)e  3<du>  +  1  -  Q(£¥>W(B)» 

tZR(u) 

■  exp{-te2k  +  nIBfe^.JW(B)(e  -l)Q(du)} 

Now  we  proceed  to  show  that 

(28)  T-imSup^B^^  |W(B)  Cexp(-^zB(u)  )-l.]Q(du)  =  0. 

By  (23),  there  exist  constants  C^g,  C^g  such  that 


exp(fzB(u))  -  l<C!e{z,(«). 


To  prove  (28),  it  suffices  to  show  that 

(29)  limSup  £  BD  EQ(dx>|W(B)  |ni(u)-m(x)  |Q(du)  =0. 

Assume  that  Be?.. ,  BOE^  and  xeBHE,  then  W(B)cSx  5h,  where  h  = 
(ak/n)1^.  By  Lemma  2, 

Q(Sx,5h)  -  e2U0h)d  =  10dB2ak/n. 

Put  C17  =  lO^a*  then 

(3°)  7  ^BfeV.lBn  Eq^dx^ Jw(B)  u)”m(x) I Q(du) 

i  C17^Jen  E0(<ix)t{Sx  5h|m(u)-m(x)|Q(du)/Q(Sx  Sh)} 

<  C17|Q(dx){|s  ^|m(u)-m(x)|0(du)/Q(SXj5h)} 

By  Lemma  4,  for  almost  all  x(Q), 

lim  L  |m(u)-m(x)|Q(du)/Q(Sx  5h)  =  0. 

r»<»  *  x,5h  * 

Further,  for  x«S(Q),  the  support  of  Q,  we  have 

fs  |m(u)-m(x)|Q(du)/Q(S  5h)  <  2M 
jix,5h  x*3n 

Hence,  by  the  dominated  convergence  theorem,  (29)  is  valid.  Thus  (28) 
is  proved. 
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r’Trn«r.»rwTrwr 


w  m  w  —  w  ^  *  v  *•  u  »  \  T»  •-  »  k.*  _  >  ■  ki  \  M..  %  i '.  .  . 


(31) 


P^Be¥.  J^kKB^tyVty  >  kc2} 

I  J  x 

<_  exp{-e2n+o(n)}  <  e-C18n. 


From  (25),  (26)  and  (31),  it  follows  that  (24)  holds,  and  (17)  is 
valid.  From  (16)  and  (17),  Theorem  1  is  proved. 
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